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ABSTRACT 

This paper addresses four issues in the design and 
execution of behavioral observation in classrooms. These four issues 
relate to the consequences of using different observation intervals, 
schedules of observation, student sampling methods, and definitions 
of on-task and off-task behavior for reliability, means, and 
correlations of time on-task and achievement, A field study observed 
108 students in 18 elementary classrooms. Pre and post-achievement 
aata were also collected- The data permit simulations of different 
intervals, schedules, sampling methods, and definitions for 
determination of their effects on the outcomes of behavioral 
observation. Findings suggested that: (1) altering definitions of 
time-on-task to include momentary off-task behaviors affected the 
conclusions for the importance of time-on-task: (2) sampling segments 
of instruction would tend to obscure the positive results for 
tiiHe-on-task: (3) reducing the number of dciys of observation also 
weakened the effects of time-on-task: (H) timing of the observation 
was not very important for the noted effects: and (5) reducing the 
number of students to less than six cay adversely affect reliability. 
fAuth or/PL) 
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Time-On-Task: Issues of 
.Timing, Sampling and Definition 



Abstract 



This paper addresses four issues in the design and execution 
of behavioral observation in classrooms. These four issues relate 
to the consequences of using different observation intervals, 
schedules of observation, student sampling methods, and definitions 
of on-task and off-task behavior for reliability, means, and 
correlations of time on-task and achievement. A field sfudy 
observed 108 students in 18 elementary classrooms. Pre and post 
achievement data were also collected. The data permit Simulations 
of different intervals, schedules, sampling methods, and definitions 
for determination of their effects on the outcomes of behavioral 
observation. 
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Introduction 

Research interest has recently focused on the centrnlity of time- 
on-task for understanding classroom effects and effectiveness (Fisher 
ct al., 1976a, 1976b; Filby and Marliave, 1977; McDonald and Elias, 
1976; Cooley and Leinhardt, 1978). This research has provided important 
evidence that links classroom practices, time-on-task and learning outcomes. 
Although the evidence in general points to positive and meaningful effects 
of time-on-task, the results are not consistent across studies nor 
across grade levels/ subject matters within studies (s.g., the results 
obtained in the Beginning Teacher Evaluation Studies, BTES, for 
mathematics/reading at grades 2/5). Moreover, the effects documented 
for time-on-task, although positive, have not been uniformly large. ^ 
Nonetheless, the effects for time factors have assumed appreciable 
stature by virtue of the fact that time factors can be altered, whereas 
more statistically important factors, such as family background or 
entering aptitude, are difficult or even impossible to alter. 

Thus, the use of time in classrooms continues to be a central theme 

in educational research. The fact that the results are modest and 

inconsistent has been attributed to particular methodological or research 

design problems, not problems with the assumptions guiding the research. 

That is, the .assumption that classroom practices have appreciable impacts 

on time-on-task which in turn affect the degree of learning is generally 

not at issue. The present state of encouraging but not entirely clear 

results is taken to indicate the existence of methodological as opposed 

2 

to theoretical problems. 
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Given this slant on the problem, it scorns reasonable to ask to 
what extent the nature of the present findings are due to particular 
methodological choices or decisions. In particular, it seems useful 
to explore how the observation scheme used, the timing of the observation, 
the length of the observation and the number of observations may affect 
the detection of time-on- task effects. This paper addresses these 
issues by using an existing set of observational data and manipulating 
it to conform to alternative sampling, timing and definitional choices. 
We examine the alternate effects of choices in 5 areas: 

1. definition of off-task behavior, 

2. length of observation visit 

3. days of observation 

4. scheduling of observation 

5. sampling of students for observation 

Data 

The data were collected in four elementary schools in a rural 
Maryland school district. Subjects were students in grades 2-5 in 
18 classes taught by 12 teachers. All students were pre- and post-tested 
in February 1978 in reading, language arts, math and social studies 
using the Comprehensive Test of Basic Skills. Students in each class 
were assigned to the top third, middle third, or lower third of the 
class based on the pre-test information, and two students (one boy and 
one girl) were chosen from each third for observation. The observations 
were thus conducted for six students per class, 108 total students 
through the second semester of 1978, and the post-test was given in 
May, 1978. 

Students were observed during their mathematics classes, which 
averaged 50 minutes. Each classroom was observed for at least nine 
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days, and nomc for aa many aa twenty-one days. The obaorvcrn recorded 
throe pieces of information for all six students during a thirty .second 
interval; the nature of the task (procedural, seatwork, or lecture); 
the student's response to the t^sk (on-task, off-task or no task 
opportunity), and the content of the Instruction (e.g. , two digit multi- 
plication, or going over p. 147). 

All six students were observed in a predetermined order every 
thirty seconds. To determine on- or off- task behavior, the observer 
took a quick look at the student's behavior and recorded the response 
at that particular instant. The observers were trained not to dwell 
on deciding whether a behavior were on- or off-task, but to record 
their first impression in accordance with established definitions of on- 
and off-task responses. 

On average, 100 observations per day were recorded for each student, 
detailing the task, the content of instruction, and the response.. 
Counting all the daily observations, we logged on averse about 1000 
observation points for each student in the sample, or about 110,000 
observation points total. 

Because of the size of *:he data base, we entered the task, content 
and response codes in a summary form which maintained the essentials 
of the information. Each entry pertained to a specific task or activity 
and gave the number of seconds each child was on-or off-task during 
that time. For example, if the class were involved in seatwork during 
the first ten minutes of the class and then the teacher explained the 
seatwork during the next eight minutes, we created two entries, one 
detailing the on/off task behavior during seatwork, and the other 
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giving the on/off tank behavior Cor each of the nix children during the 
teacher-directed activity. From those data, n "d«y M record was 
constructed which summarized the daily task, content and response patterns 
for each child. 

In addition, a special data set containing each 30 second record 
of task, content and response was compiled for five of the eighteen 
classrooms. These supplemental data will be used along with the basic 
data in the analyses. 

Definition of On- a nd Off -Task 

On- and off-task behaviors were coded during instructional pprtions 
of the lesson only. However, a child could also have a response other 
than on- or off- task during instructional time. The diagram below 
depicts the different categories and when they could occur in this obser- 
vational scheme, 



Allocated Time 



Procedural Time 



Instructional TinuT| 



Other 




Off- 




on- 


response 




Task 




Task 



The allocated time was the clock time scheduled for the mathematics 
class. Procedural time included any time spent lining up, receiving 
instructions, being involved in disciolinary action, going to fire 
drills, being interrupted by the P. A. system and the like. Instructional 
time pertained to the time spent specifically on mathematics instruction; 
discussion of world events, elections, snow storms and other material 
not pertaining to math was not coded as instructional time. On- task 
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behavior waa defined as behavior appropriate to the ta?)* at hand. The 
definition of appropriate behavior depended upon the tank and specific 
rules of the classroom. M Othcr response 11 was uued to cover situations 
in which the child was not on-task but was ;»ot off-task cither. Such 
situations arose when the child was sharpening a pencil, walking to 
another part of the room to obtain new materials, waiting for the 
teacher to help with a problem, or doing some other activity because 
the original assignment was finished. 

We focused on two particular problems in assessing off-task behavior. 
One involved the effect of including momentary off-task behavior; 
the other involved the cf c cct of including no-task-opportuuity time 
(i.e., "other response 11 ) as off-task behavior, 

a. Momentary off- task behavior 

During any class period, children may gaze out the window, fidget, 
or otherwise be momentarily distracted. On the one hand, this 
momentary off-task time can be looked upon as insignificant for the 
learning process. On the other hand, momentary off- task behavior may 
be signalling .declining attention and motivation and might therefore 
be important for understanding the learning process. The issue is 
whether these flickers of inattention should be treated as measurement 
noise and thus ignored or as true indications of the underlying variable 
of interest, engagement with learning. The final decision in this 
matter depends upon conceptualizations of "engagement"; here, x v 'c 
simply examine the measurement consequences of the choice made. We 
simulated the "noise" decision by changing all off-task behavior of 
less than one minute to on-task behavior in the supplemental sample of 
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five clmuirooimi. The av^rnntf rata of on- tank behavior ineroatied from 
.79 to .83 whl In the atandard deviation was reduced from .08 to .07. 
Including the momentary off-tank behavior yielded cor mint ionn of .24 
between on-tnnk and pre-tcat score and ,45 with powt-test aeore. 
Excluding momentary off-taak behavior, thcuo correlations became .33 
and .29. We carried out regroasJona of post-tent on pre-towt and the 
alternative measures of on- tank behavior. Using the measure which 
excluded momentary off-task behaviors produced more motion t results 
(p <.10) than did using the more inclusive measure (p <.05). 

Whether to include momentary inattention o.- not should be baaed on 
the particular model of learning ona ha;; formulated. Certain views 
of the learning process may be compatible with inclusion of these 
momentary distractions; other views would not be. The present exorcise 
was not Intended to shed light on whether a particular point of view 
Is proper or improper, but to illustrate that the methodological 
decision to include/exclude these flickers of inattention affected the 
results obtained, 

b. Other response and off-task behavior 

The dichotomy of on- or off-task provides a working categorization 
of student responses to instruction, but there are numerous ambiguous 
situations in which the student is not on-task, yet could not be considered 
off-task. For example, a student may have finished an assignment and 
have nothing more to do. Students who finish early are likely to he 
those who need less time, i.e., have more aptitude for the particular 
task at hand; thus the amount of finished tit.ie should be positively 
related to achievement, in contrast to the negative relationship of 
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.iff-task variables am! achievement, In our data, the correlation lutwren 
finished time and pom-teat «eore was , 19 while the correlation betwem 
off-tank And pent-teat aeora waw -,28, In reareNwlomi (not detailed 
here) in which finished time wafj Included with off-tank time, th« effect* 
of cii tank time were diminished appreciably, 

Length of ohservat l on y } n J t 

An important denign consideration In the length of the observation 
period. One could observe a single classroom all day long, for Nome 
fixed fraction of the day, or for »ome specific Itific rucloual program. 
Or, a combination of these lengths of observation might be used. 
Because our interest was in how the use cf tint? affects mathematics 
achievement, we observed students during their entire mathematics 

instruction. It was not possible (given our budget constraints on 

i 

ovscrvcr time) to observe all teachers within a school. An alternate 
decision would have been to observe more teachers, but for some smaller 
segment of their mathematics instruction, Wc might have decided, for 
.example, that instead of visiting one teacher for sixty minutes we might 
have us^ one of the combinations below: 

NO. TEACHERS NO. MINUTES TOTAL TIMK 

2 30 60 

3 20 60 
6 10 60 

The choice among these alternatives is basically between getting 
enough classrooms to provide stable estimates of the effect of time- 
on-task, and scheduling sufficient time to ensure that the observed 
behavior is representative. If tine-on-task is distributed fairly 
uniformly across the day or the period of instruction, then a time 
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sample may hv entirely mfe'inae*. TahU I rJvm the ttman* and tuaudard 
deviations of tirat*«iM«task far nine davM of c*h*U'rvai I mi In on* cla^uri^i 
Tht* first column* provide utitttttUvn for rh« first 10 Minuter* of ela*-i; 

Tnbiu i About Hero 

**c0iitl for tb» firm ?0 minuet**, and thr third for the finu 30 rltmu^. 
The overall m<*<tn for tin* time period in tfuppltrd a« well a?j the mean for 
the particular 10 minute moment. Tin* average on^tmik lira* in thin 
chuitt wa?4 markedly hiKher during the firm 10 minutes of instruction 
than It i.iia for tha next 10 or 20 minuter*. Clearly, tin* timing of 
observation In thin elanfl room wan Important for the re^ult^ obtained 
as tlta<?-on-ta:>k wa# not distributed evenly arrow* the vithemat I c?s class 
time. Other classes atarted off with lover on- tank latea. Necmed to 
warn up to instruction, and have higher on~taak rate*a, and then to 
dfv down. Still other clnanroomn had no cnnrtlstrnt pattern at all. 
Consequently, it Is difficult to predict what the effect in general 
would be if selected portion*} only of the class time were observed. 
Thus, although the the effect of observing shorter periods may not be 
consequential for the reliabilities obtained (sec Rowley, 1976), how 
those periods are selected may be v>ry consequential. 

To illustrate this point, we repressed post-test achievement 
scores on '^re-test scores and alternate measures of on-task rate, 
namely measures from the tirst ten, twenty, thirty and fifty minutes 
of Instruction. The F values obtained for the line-on- task measures 
were .010, 1.22, 1.09 and 4.34, respectively. The n of this sample 
was extremely small (22 students): however, the results surest that 



Table 1 



On-Task Rate for Selected Portions 
of Mathematics Instruction 
in one classroom 



Minut es Minut es Minut es 

1-10 1-20 1-30 



Day 


X 


c 


X 


o . 


X 


a 


1 


.906 


.066 


.878 


.046 


.865 


.051 


2 


.911 


.086 


.815 


.059 


.805 


.092 


3 


.922 


.078 


.923 


.079 


.921 


.084 


4 


.739 


.236 


.818 


.062 


.775 


.062 


5 


.817 


.002 


.690 


.158 


.653 


.195 


6 


.889 


.087 


.869 


.073 


.804 


.099 




.-884 - 


—.169 - 


• . 866 


.188- 


-.809 


.175 


8 


.958 


.066 


.928 


.063 


.889 


.088 


9 


.825 


.196 


.841 


.148 


.850 


.171 



X .872 .848 .819 

X (this segment) .872 .824 .761 
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observing for shorter segments would have appreciably altered the effects 
obtained.^ 

Altering the number of days of observation 

Conventional wisdom has it that about ten days of observational data 
should be sufficient to accurately portray the activities of a classroom. 
However, fetf studies have investigated the effects of observing classrooms 
for fewer or more days, even though this question is of considerable 
design and practical importance. If we can obtain sufficient information 
in a shorter period (e.g. five days instead of ten), it would be possible 
to observe substantially more classrooms without appreciably altering 
the observation costs. 

--jiTTiV^ we observed some classrooms for as many 

as 21 school days and others for as few as 9 days. With these data, 
then, we can pretend that we had observed a fixed number of days (e.g. 
3, 4, 5, 6, 7, 8, 9) and assess how this observation schedule would have 
affected the detection of effects of time-on-task on achievement. We 
'think of time-on-task as a variable which is influenced not only by 
an individual child's disposition, aptitude, and idiosyncracies , but 
also by the instructional setting in the class and by external events 
such as the daily weather. Each child may have a stable rate of on-task 
behavior with daily fluctuations depending upon his response to the 
classroom and other environmental settings. Given this view of time-' 
on-task as a variable, a natural way to capture the daily and individual 
variation is to view each day's time-on-task as an item in a scale of 
total time-on-task. We can then see how consistent the behavior is 
across a differing number of days or items in the scale. 
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As expected, increasing the number of days does provide an overall 
increase in reliability. The median coefficient alphas obtained for 
3-9 days were -54, .57, .71, .73, .79, .81, .82. Whether the increase 
in reliability obtained from observing nine days vs. 5 days is conse- 
quential depends on the effect one is trying to document. Because 
reliability determines the maximal correlation that one can find between 
achievement outcomes and time-on-task, the obtained reliability is of 
some consequence. To assess the effect of these variations in reliability, 
we used the third grade sample (n=36) and regressed post-test CTBS score 
on pre-test and alternate measures of time-on-task, namely measures 
obtained from: 

1. five days observation 

2. nine days observation 

3. eighteen days observation 

Table 2 shows that had we observed for the first five or first nine 
days our effects for time-on-task would have been much more modest. The 

Table 2 About Here 

n 18-day" results show significant effects for on-task minutes, engagememt 
rate and off-task rate. Had we observed the same classrooms, but for 
fewer days, the results obtained would have been much weaker: 

It is possible that the days at the end of the observation were 
significantly different from the days at the beginning; if this were 
the case we would be witnessing an effect for timing and not for length. 
This issue is explored in the next section. 

^5 



Table 2 



Comparison of Results Obtained 
for time-on-task using 5, 9, and 18 
days of observation 



18 days 9 days 5 days 

b/beta F b/beta F b/beta F 



time-on-task 


47.51 


6.56 


33.62 


3.21 


32.25 


3.62 


rate 


(.178) 


p < .05 


(.129) 


P < .10 


(.138) 


P < • 


time-on-task 


.249 


5.14 


.156 


2.28 


.131 


2.19 


rate 


(.165) 


p <.05 


(.111) 


n.s. 


(.11) 


n.s . 


time-off-task 


-48.1 


■ 4.33 


-32.9 


2.07 


-.109 


.141 


rate 


-(.147) 


p <.05 


-(.10) 


n.s. 


-(.03) 


n.s . 


time-off-task 


-.450 


2.86 


-.329 


1.39 


-16.76 


.459 


minutes 


-(.121) 


n. s . 


-(.09) 


n.s . 


-(.05) 


n.s . 



Timing of observation days 

Throughout the school year, there are no doubt more intensive and less 
intensive periods of instruction. At the beginning of the school year, for 
example, much of the instructional time may be spent in review or in estab- 
lishing classroom rules and norms for behavior. In many urban schools, with 
high rates of student mobility, the first six weeks are needed to stabilize 
the school enrollment. Because of the constant transferring in and out of 
classrooms, this early part of the semester is often an instructional loss.. 
Other examples of uneven distributions of effort throughout the school year 
are the days immediately before and after major holidays, such as Christmas 
and Easter vacations. These sources of differences in time-on- task are 
predictable and probably similar from class to class. In addition, there 
may be different levels of seriousness in the classroom, depending upon the 
amount of material the teacher has covered and the amount she expected to 
cover by that point in time. This variation in the teacher's expectation 
for levels of attentiveness would then not likely be the same from class to 
class. 

We are able, in a limited fashion, to see if time on-task differs by 
time of year, using these data. For four classrooms we observed students 
for a ten day period in February and also in May. The means and standard 
deviations for these classrooms are provided in Table 3 for the two different 
time periods. Table 3 also provides the reliabilities for the two periods 
of observation (column 5 and 6) anc* for two mixed scales SI and S2 composed 
of equal number of items from February and May. The reliabilities and the 
means do not appear to be very different for the two time points. This 
table supplies limited evidence of the consistency of the classroom over 
time, which suggests that the timing of the observational period may not be 
all that consequential. It also suggests that our failure to find significant 
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Table 3 

Comparison of Mean Values and reliabilities obtained 
or time on task in February and May 



Feb. 


May 


Feb. 


May 


Combined 


SI 


S2 


Means 


Means 


CC 


cc 


CC 


CC 


CC 


.844 


.856 


.92 


.96 


.97 


.93 


.94 


.899 


.900 


.76 


.72 


.71 


.70 


.53 


.929 . 


.930 


.67 


.76 


.85 


.70 


.70 


.842 


.847 


.85 


.76 


.79 


.16 


.71 



effects for time-on-task using only nine days of observational data 
was most likely due to the decreased reliability of the scale and not 
due to scheduling effects. 

Altering the number of students sampled in the classroom 

Another decision which has to be made is whether to observe all 

students in the room or to follow a sample of students. Whether to 

observe the entire classroom or selected students depends largely on 

the purpose of the observation. If one is interested primarily in how 

classroom organization affects time-on-task, the entire class would 

probably be observed. Other strictly pragmatic elements such as high 

absenteeism or sensitivity of identifying students for observation msiy 

influence this decision. 

Given that the practical and theoretical concerns the dictate that 

sampling should take place, the question is how many students are 

needed to obtain a reliable estimate of the on-task behavior for the 

class. We can examine this issue in two ways with these data. In one 

classroom, we actually observed twelve students as opposed to six, and 

# comparing the class means and standard deviations and reliability 

obtained for these six vs. twelve shows them to be very similar 

(x 12 - .87, x & = .86, r 12 « .92, r & = .89). Another way we can 

focus on this issue is by reducing the number of students and comparing 

the obtained reliabilities. We used a random selection of three of 

the six students to assess the effect this sampling might have on 

reliability. The median reliabilities were not appreciably reduced 

4 

by selecting only three students. However, given the fragility of 

time-on-task effects which we have documented here, it would seem 

worthwhile to keep reliability as high as possible. In this instance, 
observing six students would seem desirable. 
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Summary and discussion 

This paper has examined how various methodological decisions may 
influence studies of the effect of time-on-task on achievement. We 
found that altering definitions of time-on-task to include momentary 
off-task behaviors affected the conclusions for the importance of time- 
on-task. We found clear evidence that sampling segments of instruction 
would tend to obscure the positive results for time-on-task. We further 
showed that reducing the number of days of observation also weakened 
the effects of time-on-task. The timing of the observation was not very 
important for the noted effects, however. Finally, we explored the 
effect of sampling differing numbers of students, and suggested that 
reducing the number of students to less than six may adversely affect 
reliability. 

The findings in this paper suggest that although there is an 
understandable urge to lessen the observation time in order to bolster 
the number of settings observed, such steps should only be taken 
cautiously. Whether the effects detected and not detected here are s 
bound up with the particulars of this observation study can only be 
determined by more systematic examination of these methodological issues. 
In this sense, we hope the paper serves more as a source of what the 
question might be than of what the answer is. What this paper does 
show is that methodological decisions, including some that appear 
quite minor, can have major consequences for the conslusions that are 
drawn from observational data. 
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Notes 

A typical finding has been that time-on- task when added to a 

2 

regression of post-tost on pre- test will increment R by about 3 percent. 

2 

Although increments to R provide a conservative view of the importance 
of a variable, other indicators, such as the magnitude of the beta 
weight or the residual variance accounted for have not been substantial 
either. 

An alternative perspective would be that the work is basically atheoretical 
so thau it is natural to fault the methodology. 

For five of the eighteen classrooms, we coded each 30 second interval 
of task, content and response. From this sample, the twenty- two 
students who had complete test and observational data were used in the 
regressions reported in this section. 

The median reliabilities obtained for three students in comparison to 
six students for three to nine days of observation are: 

3 days 4 days 5 days 6 days 7 days 8 days 9 days 

3 

•Students .43 .65 .63 .63 .71 .77 .81 

6 

Students .54 .57 .71 .73 .79 .81 .82 
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